Improved Alignment-Based Algorithm for Multilingual Text Compression

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Alignment Based Algorithm for Multilingual Text Compression

Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. This is done based on bilingual text alignment, a mapping of words and phrases in one text to their semantic equivalents in the translation. A new multilingual text compression scheme is suggested, which improves over an immediate gen...

متن کامل

Using alignment for multilingual text compression

Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. We explore the details of this framework and present experimental results for parallel English and French texts.

متن کامل

An Improved Hierarchical Lossless Text Compression Algorithm

Several improvements to the Bugajski-Russo N-gram algorithm are proposed. When applied to English text these result in an algorithm with comparable complexity and approximately 10 to 30% less rate than the commonly used COMPRESS algorithm. I. The N-Gram Algorithm The N-gram algorithm of Bugajski and Russo [1] is a hierarchical dictionary-type universal lossless source coder for a finite source ...

متن کامل

Extending Huffman Coding for Multilingual Text Compression

Traditional text compression algorithms such as Huffman and LZ variants are usually based on 8-bit characters sampling. However, under the unicode representation for multilingual information, the character set of each language such as Chinese and Japanese is consisted of a very number of distinct characters and thus 16-bit or 32-bit character sampling is needed. Consequently, when text compress...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics in Computer Science

سال: 2012

ISSN: 1661-8270,1661-8289

DOI: 10.1007/s11786-012-0138-1